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ABSTRACT 



Some issues in the design of classroom research on second 
language teaching are discussed, with the intention of helping the researcher 
avoid conceptual pitfalls that may cripple the study later in the process. 
This begins with an examination of concerns in sampling, including definition 
of a population to be studied, alternative sampling strategies (random 
sampling, stratified random sampling) , sample size, and the generalizability 
of the results based on the sample selected. Different types of variables 
(dependent, independent, moderator, control, intervening) and their roles in 
the research are then explained. A subsequ.ent examination of research designs 
first defines treatment, control and experimental groups, and observations, 
and then distinguishes different designs, including true experimental, 
posttest-only, pretest-posttest (with and without control group) , time 
series, and nonequivalent group designs. Characteristics, advantages, and 
problems with each design are noted. Validity is defined as the degree to 
which results can be accurately interpreted and effectively generalized, and 
these concepts are discussed further, including examination of threats to 
both internal and external validity. Finally, ethical issues are considered, 
basic guidelines are offered, and sources for further guidance are suggested. 
(Contains 26 references.) (MSE) 
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Chapter 5 



Designing a Language Study 



James Dean Brown 

University of Hawaii at Manoa 



Tliis paper intro(duces some of die overarching issues in second language re- 
search. They are issues wliich must be addressed before conducting a study so 
diat die researcher can avoid conceptual pitfalls that may cripple the study later 
on. Tlie discussion will begin with the considerations involved in sampling a 
group, or groups, of subjects to be used in a study. Next, the different types of 
variables diat researchers define in a study will be covered. Then, some of the 
research designs that can be used in second language studies will be explored. 
In addition, the factors which may jeopardize the internal and external validity 
of language studies are covered. Finally, the ethical issues involved in collecting 
data, conducting research, and reporting the results will be discussed. 



Sampling 

In language studies, it is often necessary to use sampling techniques. To under- 
stand why such teclmiques are necessary, it is first important to grasp the differ- 
ence between a population and a sample. In research, a population can usually be 
defined as the entire group of language speakers or learners that the researcher 
wants to saidy. Unfortunately, few researchers have the resources to study, for 
example, die entire population of ESL students studying ESL in American universi- 
ties, or the entire population of EFL students in the world, or even all of the male 
chemistry saidents from Gemiany who are studying in the United States. As a 
result, most researchers prefer to use a sample, diat is, a subgroup of the students 
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representative of tlie given population. By using a sample, data can be practically 
and effectively colleaed, sorted, and organized. There are two basic strategies that 
are generally used in language studies for selecting samples from populations. 
These strategies are called random sampling and stratified random sampling. For 
both approaclies, die purpose is to create an accurate sample, or subgroup, which 
can be said to be representative of die population as a whole. 

Alternative Sampling Strategies 

The underlying principle in random sampling is that each individual member of 
the population must have an equal chance of being selected into the sample. 
Three steps can be used to insure such equality of chance: 

1. Clearly identify the population in which the researcher is interested. 

2. Assign an identification number to each member of the population. 

3. Choose the subjects for the sample on the basis of a table of random numbers. 

A table of random numbers is a list of numbers, usually generated by a com- 
puter, that contains no systematic patterns. Most introductory statistics books con- 
tain such a list (for example, see Appendix A in Shavelson, 1981). Using a table of 
random numbers leaves the choices of who will be included in the sample up to a 
dispassionate and random table of numbers, rather than up to the researcher who 
may have subde biases (conscious or unconscious) diat could affect the results of 
the study. Once a large enough number of subjects is randomly selected, the 
resulting random sample can be assumed to be representative of the entire popu- 
lation from wliich they were drawn (Brown, 1988, pp. 111-113). 

Other, more readily available techniques can be used to obtain a random 
sample. For example, the researcher might like to pull numbers out of a hat, use 
a deck of cards, or repeatedly throw a pair of dice in selecting subjects for a 
sample. Any technique wherein each member of the population has an equal 
chance of being selected, thereby ruling out biases on the part of the researcher, 
will be acceptable for random sampling, whether the sampling be for selecting 
subjects from a population for inclusion in a study, or for separating tliem into 
subgroups witliin the study itself. 

Another strategy that is sometimes used in language studies is called strati- 
fied random sampling. In this case, four steps are usually used: 

1. Clearly identify the population in which the researcher is interested. 

2. Identify the salient characteristics of the population (called strata). 

3. Randomly select members from each of the strata in the population (using a 
table of random numbers or other techniques described above). 

4. Check to insure that the resulting sample has about the same proportions of 
each characteristic as the original population. 

For instance, in tlie population of all ESL students studying at the University of 
^Hawaii at Manoa (UHM), it might be useful to identify subgroups, or strata. 
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witliin die population based on die following characterisdcs: gender (male or 
female); countiy of origin; native language; academic status (graduate, under- 
graduate, or unclassified); and major (science, liumanides, or undeclared). Given 
correct information about die propordons of diese characteristics in die popula- 
tion of ESL students at UHM, the researcher could dien randomly select from 
each of die strata in propordon to those population characterisdcs. The sample 
that results would intentionally take on the same proportional characterisdcs 
found in the entire population. Creating a stratified random sample still requires 
random sampling, but has die advantage of providing a certain degree of preci- 
sion to die representativeness of die resulting sample — a fact which facilitates 
die use of the identified characteristics as variables in the study. 

Decisions about which strategy (random or stratified random) to employ in a 
particular study must be reached rationally, and in advance. Tliere are several 
considerations that must be kept in mind in making such decisions. First, it is 
generally useful to employ stratified random sampling when the population in 
question is fairly heterogeneous in nature. Tlie concern is that random sampling 
might not provide for selection from each of the strata, or subgroups, in die 
population. Second, a stratified random sample becomes imperative when the 
samples involved will be small or the groupings within the study will be un- 
equal in size. Third, it must be remembered that, if properly conducted, strati- 
fied random sampling has the advantage of letting the characteristics of the 
population determine which strata will be sampled. Hence, the stratified strat- 
egy is useful if die study will focus on the groups’ characteristics as moderator 
or control variables (see Brown, 1988, pp. 11-18). 

Alternatively, if die samples involved will be fairly large, straightforward ran- 
dom sampling can be employed. Random sampling is much easier to perform 
since diere is no need to define the characteristics of the population. It is only 
necessary to assume that die sample represents the population from wliich it 
was taken. Tliis assumption is widely accepted in researdi circles even diough 
it is counter-intuitive for some language teaching professionals. 

Sample Size 

One of the first questions diat will arise with regard to sampling is: How big 
must a sample be to be considered large enough? There is no easy answer to 
this question. However, it is clearly true that a large sample is better (in die 
sense of “more representative”) than a small one. Consider a sample whidi 
includes all but 1% of a population of 1,000 language students (that is, a sample 
diat contains 99% of the population). It is likely that such a sample is more 
representative of die population dian one containing only 1% of it or 10% or 
30%. However, knowing this does not answer the question of how big a sample 
must be to be considered large. Unfortunately, sample size dedsions depend on 
the situation involved in die study as well as upon the types of statistics diat will 
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be used. Statistics teachers will often give rules of thumb like the sample size 
should be at least 28 (or 30) per group or per variable. This is not bad advice per 
se, however, such rules of thumb are usually vague and imprecise, and in any 
case are conveying the minimum number tl^at you will need for correctly apply- 
ing many of the statistics that come up in research. 

Another point of view is that, instead of estimating the bare minimum num- 
ber of subjects, the researcher should be estimating the minimum number of 
subjects that would be necessary for a statistically significant result to be ob- 
tained (if it really exists in the population) given the application of a particular 
statistical procedure under the conditions of the study that is being planned. 
Such estimations can be made by using power analysis. One tiding tl^at power 
analysis can be used for is to analyze the relationship between the probability 
of finding a statistically significant result and the sample size given a particular 
set of expected results. If, for instance, a researcher wanted to estimate the 
number of subjects that would be necessary to find a statistically significant 
difference between the means of two groups of subjects, it could be done 
mathematically on the basis of pilot data, or other previous research that may be 
available in the literature. Such estimates can be made for a variety of the 
statistical procedures used for mean comparisons, correlation, and regression, 
as well as comparisons of frequencies (for more on power analysis, see Cohen, 
1988; Kraemer & Thiemann, 1987; Lipsey, 1990). Unfortunately, power analysis 
is mathematically complex. However, there is computer software available (e.g., 
Borenstein & Cohen, 1988) tl^at can resolve this problem. 

In short, when tliinking about sample size, the best strategy is to make sure tliat 
the population is clearly defined, and that the sampling procedures make sense. If 
pilot data or other previous research is available, it will prove helpful to use power 
analysis to estimate tl^e sample size that is necessary to find a significant effect if it 
exists. If pilot data are not available, you may have to design your study such tliat 
the samples involved “seem” large enough to be representative, while keeping in 
mind tliat a good aile of tluimb is the larger the sample size the better. The issues 
involved in sampling are somewhat subjective, and must in part be left up to the 
researcher. Sampling procedures are important partly because of tl^e way thzi they 
affect the generalizability of the study. 

The generalizability of a study can be defined as tl^e degree to which the results 
are meaningful beyond tl^e study itself witl^ regard to tl^e entire population in 
question. If the sampling tecliniques have been properly conducted and the sample 
is large enough, tl^ere should be no question in tl^e researcher’s mind (or in a 
reader’s mind) as to the degree to wliich tl^e sample represents the population. If 
tliere is some question, tl^en tl^e sampling techniques should be improved or the 
sample sizes increased, or botl^. (For more information on sampling and its use in 
language studies see Brown, 1988; Hatch & Lazaraton, 1991.) 
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Different Types of Variables 

A variable is anything tliat can vaiy in a study. However, research is largely the 
study of what happens when variables are systematically manipulated in planned 
combinations. There are essentially five roles that variables can play in a study: 
dependent variables, independent variables, moderator variables, control vari- 
ables, and intervening variables. 

The dependent variable in a study is tlie variable of primary focus. It can also 
be thought of as tlie variable that is measured and studied to determine if other 
variables have an effect on it, or are related to it. The independent variable in a 
study is tlie variable tliat has been selected by the researcher in order to study 
its effect on tlie dependent variable (hence, the independent variable is some- 
times also called the manipulated variable). For instance, for a research ques- 
tion like “What is tlie effect of X on Y?”, X is the independent variable and Y is 
tlie dependent variable. Or, for a research question like “How well does X 
predict Y?”, X is the independent variable and Y is the dependent variable. 

The relationsliip between tlie independent and dependent variables is central 
to any saidy. However, sometimes tlie researdier will also want to include a rru)d- 
erator variable in order to determine the effea of the moderator variable on the 
relationsliip between tlie dependent and independent variables. Thus, if a mod- 
erator variable were included, a question like the following could be posed: “What 
is die effect of X on Y when Z is present or absent^" In this last case, X is the 
independent variable, Y is the dependent variable, and Z is a moderator variable. 

In language researcli, diere are usually variables odier than the dependent, 
independent, and moderator variables which cannot be included in the design -or 
otherwise direcdy studied. Nonetheless, these variables must be accounted for 
often as control variables. Control variables are variables which are eliminated 
from the study, held constant, or odierwise kept from interfering with the study of 
die central relationship between the independent and dependent variables. For 
instance, in a saidy of die effect of Method A on English language proficiency (as 
measured by TOEFL), die researcher might compare the TOEFL scores of two 
groups, one who had been taught by Mediod A and another who had received no 
instruction, a control group. Tlie researcher would be most interested in the rela- 
tionsliip between die independent variable. Method, and die dependent variable, 
English Language Proficiency. However, there are a number of variables which 
might interfere widi die relationsliip between Method and English Language Pro- 
ficiency: gender, intelligence, aptitude, years of language study, etc. The researcher 
might choose to contiol gender by eliminating all males from die study. The re- 
searcher might fuitlier choose to use only students who had studied six years of 
English to hold die years-of-study variable constant. Random selection can also be 
used to create groups diat are dieoretically equal on all variables except those 
being manipulated as independent, dependent, and moderator variables. 



60 



James Dean Brown 



Perhaps, the most confusion is caused by the term “intervening” variable 
because it is used in two distinctly different ways. On the one hand, intervening 
variable is used to describe tlie construct which underlies the relationship be- 
tween the independent and dependent variables. For instance, in tlie example 
study above on the effect of Method A on English Language Proficiency, tlie 
researcher might label the construct underlying the effect as “method effect” or 
“learning” or “language acquisition" depending on how it is conceptualized. 

On tlie other hand, intervening variable is used to describe a variable that is 
unanticipated in a study, yet surfaces as being a possible explanation for the 
relationsliip between the independent and dependent variable. In the example 
study, it might turn out that any difference discovered in the proficiency scores 
of tlie Method A and Metliod B groups were caused by an unanticipated inter- 
vening variable rather than by the methods themselves. For instance, it might 
turn out that the teacher of the Metliod A class was just a better teacher than the 
teacher of the control group. Tlius, a teacher effect turns out to be a possible 
intervening variable in tlie sense that it was unanticipated yet has potential 
explanatory power. 



Research Designs 

To understand the basic designs that are used in quantitative language studies, 
it will first be necessary to define some of the fundamental terms that are used. 
The first idea that must be understood is that of a treatment. A treatment is 
something that the experimenter does to one group so as to study die effects of 
the treatment on die people involved. A treatment may be a specific teaching 
strategy, application of a set of materials, use of a particular reward system, or 
any other experience that the researcher wants to apply to die subjects for the 
study. Typically, for the sake of comparison, one group receives the treatment 
while another group does not. Thus, the subjects are divided into two or more 
groups: a control group and one or more experimental groups. The control 
group usually receives no treatment, or a placebo (some substitute that is pre- 
dicted to have no effect), while the experimental group receives the treatment. 
In a language program, die treatment is likely to be some aspect of the language 
teacliing or learning experience. 

The reason for adrninistering a treatment to the experimental group and 
nothing to die control group is to determine whether the treatment has had an 
effect. In order to do so, one or more observations must occur which allow for 
comparisons of the two types of groups. Tliese observations may take many 
forms. In quantitative studies, observations may be simple tallies, rankings, or 
test scores. The point in making observations is that something of interest to die 
researcher must be observed or measured so that comparisons can be made 
between the control and experimental groups. Naturally, whatever is observed 




( 



Designing a Language Study 



61 



or measured must be related to die treatment. Tlius, in a language program, if 
the treatment was some form of pedagogy, you might be interested in observ- 
ing die language achievement test scores in order to determine the effect of die 
treatment on achievement. 

It is important to note diat studies involving anything other than examination 
and description of test scores are difficult to conduct. The study may be de- 
signed in an airtight manner (difficult in any teaching or learning situation), but 
in addition, considerable knowledge of statistics must be applied — usually more 
than the knowledge provided in one or two statistics courses. This warning is 
meant to encourage budding researchers to seek adequate guidance in design- 
ing quantitative studies and analyzing die statistical results. 

Tlie two sections that follow will explain two categories of quantitative stud- 
ies: tme experimental designs and quasi-experimental ones. Tliis is a very use- 
ful distinction explained much more fully in Campbell and Stanley (1963). 

True experimental designs are the most controlled language studies. They must 
be carefully planned from beginning to end. Hence, they are die closest diing in 
language studies to wliat most teadiers believe scientific experiments are like. One 
of die keys to identifying a true experimental design is that die subjects in the study 
must be randomly selected from the population being studied, and randomly 
assigned to die treatment or control group. Randomly is used here stricdy in the 
sense diat it was defined above. As described in the earlier section on Sampling, 
diis must be done so diat every member of the population has an equal chance of 
being selected. If diese procedures are followed and the resulting groups are laige 
enough, die researdier is justified in assuming that the two groups have very much 
die same characteristics. Tlius, taie experimental designs have random selection as 
a precondition. The same thing is taie for posttest only designs, pretest-posttest 
designs, or any combination of the two. 

The posttest-only design (one type of true experimental design) is particularly 
dependent on random selection because it is assumed on the basis of sampling 
dieory diat the experimental and control groups are equivalent at the outset of 
die study. Such a study is designed as shown in Figure 1. Notice that step one is 
to use random selection to create equivalent groups. The experimental group 
receives the treauiient, while die control group does not (or receives a pla- 
cebo). Both groups are then observed on the same scale and the performances 
of die two groups are compared. If die experimental group has significandy 
higher performance than die control group, arguments can dien be built that 
the treatment has an effect. The degree to which such claims can be made will 
natiirally depend on die magnitude of the differences in performance. 

Tlie pretest-posttest design, while it also assumes random selection of the two 
groups, allows die researdier to check die equivalence of the two groups at the 
beginning of die study, usually a pretest of some sort. Such a study would typically 
be laid out as shown in Figure 2. This additional step allows for checking the 
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Figure 1 

Taie Experimental Design, Posttest Only 
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equivalence of die two groups in Step 2, but also allows for studying the amount 
of gain diat has been made by eacli group between Steps 2 and 4. Tliis potential 
for saidying gain allows die researcher to consider additional issues. For example, 
if diere is a difference between die two groups on die posttest, die researcher can 
saidy whether die difference is as large as the difference between die pretest and 
posaest performances of die experimental group. If diis is not true, die observed 
differences may have some source other than, or additional to, die aeatment. 
Thus, the pretesaposaest design is generally more powerful than the posaest only 
design because more inferences can be drawn. Pretest-posaest designs can be- 
come much more complex including various types of aeaUiients used simulta- 
neously and various observation techniques used in the same study. 



Figure 2 

Tnie Experimental Design, Pretest and Posttest 
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From a practical point of view, taie experimental designs are often doomed 
in real language teaching settings. First, students are rarely randomly selected. 
Thus, many researchers are working eitlier with what is called an intact group 
or with the entire population of students when they set out to do a study. 
Second, the researcher cannot set aside half of the students, randomly selected 
or otheiwise, to receive no language training, or a placebo. Eitlier students want 
the training or they do not, and language researchers are seldom in die moral or 
monetaiy position to simply withhold treatment (training) from one half while 
the other half receives training. As a result, language researchers are more likely 
to turn to what is called a quasi-experimental design. Quasi-experimental de- 
sigm, though less than perfectly controlled, provide useful alternatives to tRie 
experimental designs. Quasi-experimental designs are adequate for die pur- 
poses of studying many language issues — particularly if no sweeping claims are 
going to be generalized from the results. According to Campbell and Stanley 
(1963), the main characteristic that makes a quasi-experimental design more 
practical for language studies is that the researcher has more control over die 
collection of data in terms of scheduling and deciding who will participate. 
However, the designs are weaker, and the results must be interpreted very 
carefully. Three types of quasi-experimental designs will be presented here: 
pretest-posttest designs (without control group), time series designs (without 
control group), and nonequivalent groups designs. 

The pretestposttest design without control group is like the pretest-posttest 
design discussed above except that it lacks a control group. Such a design is 
shown in Figure 3. This type of design could be used as follows: A general 
proficiency pretest could be given at the beginning of a language program (die 
treatment) and again as a posttest at die end of die program. If diere is a large 
gain in average scores between the beginning and end of the program, it might 
be judged as a success. However, because diere is no control group in such a 
study, the researcher can never know for sure that the gain was not a result of 
language exposure outside of die program, or a result of a testing effect (that is, 
the effect of having taken the test twice), or a result of some odier undeter- 
mined factor. In other words, the observed gains in scores may have been due 
to factors other than the learning diat took place in die program. 



Figure 3 

Quasi-Experimental Design, Pretest and Posttest 
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Time series designs are more elaborate versions of tlie pretest-posttest design, 
Tlie only stril^ing difference is tliat, in lieu of one pretest and one posttest, a series 
of observations, or tests, are made, Tlien, a treatment is inserted in tlie middle of 
tliis series. Such a design is described in Figure 4 (in wliich “O” stands for Obser- 
vation), In a time series design, tlie researcher can claim tliat tlie potential conse- 
quences of tlie testing effect mentioned above are controlled in tliat ail students 
are made tliorouglily familiar with tlie format and content types on the observation 
instalments long before tlie tieaUnent comes into tlie picture. One problem that 
arises witli tliis type of design is tliat it sometimes calls for tlie development of 
numerous instalments, all of wliicli must be very similar in what tliey measure. 



Figure 4 

Quasi-Experimental Design, Time Series 
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The nonequivalent groups design is different from the true experimental pre- 
test-posttest design only in tliat tlie subjects are not randomly selected into tlie 
experimental and control groups. Such a design is shown in Figure 5, Because tlie 
groups are not randomly selected, tliey cannot be assumed to be equivalent at die 
beginning of die study. As a result, the equivalence of die groups must be checked 
in Step 2 (or odieiwise controlled stadsdcally). If it is possible to set up a control 
group in diis manner and die groups do indeed prove to be equivalent at die 
beginning of die study, the nonequivalent groups design can prove fairly power- 
ful, However, if such a control group cannot be established, die quasi-experimen- 
tal version of die pretest-posttest design (Figure 3) may be the most effecdve 
design diat can be used. 



Figure 5 

Quasi-Experimental Design, Nonequi valent Groups 
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There are many other types of complex designs (see Campbell & Stanley, 
1963 , or Tuckman, 1978), and numerous ways of grouping and analyzing the 
results of those designs (see for instance, Keppel, 1973; Kirk, 1968; Pedhazur, 
1982; Tabachnick & Fidell, 1989). 



Validity 

The validity of a study can be defined as the degree to which the results can be 
accurately interpreted and effectively generalized. The first part of this defini- 
tion — the degree to which the results can be accurately interpreted — is often 
referred to as internal validity. The second part, the degree to which the results 
can be generalized, is often labeled external validity. Table 1 lists the different 
factors that can affect the validity of a saidy (after Campbell & Stanley, 1963). 

Internal Validity 

The eight threats to internal validity, listed above, are variables that must be 
controlled in designing a study so that the results can be accurately interpreted. 

Histoiy includes anything that happens to the subjects, other than tlie in- 
tended treatment, between the observations in a study. For example, for the 
design shown in Table 1, history would be anytliing, otlier tlian the treaunent. 



Table 1 

Factors Threatening Validity (after Cambell & Stanley, 1963) 



Type of Validity 
Factor 

Internal Validity 

1. History 

2. Maturation 

3 . Testing 

4. Instrumentation 

5 . Statistical regression 

6. Selection bias 

7. Experimental mortality 

8. Selection-maturation interaction 

External Validity 

9 . Reactive effects of testing 

10. Interaction of selection biases and the treatment 

11. Reactive effects of experimental arrangements 

12. Multiple treatment interference 
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that occurs between the pretest and posttest for eitlier the experimental or 
control group in the Tme Experimental Design, Pretest, and Posttest. 

Maturation refers to any of tlie processes in tlie subjects’ lives tliat occur be- 
cause of the passage of time and might interfere witli interpretation of tlie results of 
a study. For instance, fatigue, hunger, aging, changing schools, or passage through 
pubeity would all be maturation factors tliat the researcher might want to consider. 

Testmg describes any influence that taking one test has on the scores of 
another test. For instance, taking the pretest shown in Table 1 might affect the 
scores on the posttest. The testing effect might be particularly pronounced if the 
type of test involved were completely new to tlie subjects involved. Consider a 
group of subjects who had never taken a cloze test before. If one were admin- 
istered as a pretest, the subjects might learn test taking strategies that would 
make them more comfortable and make them score higher on a subsequent 
posttest, regardless of any treatment that was administered. 

Insti-umentation involves the impact of variations in the tools of measure- 
ment or problems with the reliability of those tools (for much more on this latter 
topic, see Brown, 1995a, 1995b) on the obtained measurements, or scores. For 
example, a problem of instrumentation would arise if version A of a test was 
used in the pretest, but version B was used on the posttest. The problem is that 
any differences in performance could be due to discrepancies in the versions of 
the test (the instalments) rather than to any treatment involved. 

Statistical regression describes the moderating effects of selecting groups 
with extreme scores, eitlier very high, very low, or both. Under such conditions, 
the probability is tliat students with high scores will tend to score lower (i.e., 
closer to the average score), while students with very low scores will tend to 
score higher (i.e., closer to the average score) for reasons having nothing to do 
with any treatments involved. 

Selection bias describes the impact of selecting subjects into the groups of a 
study for reasons other tlian chance. For instance, if the subjects for the experi- 
mental group in Table 1 were selected from students in 8:00 a.m. ESL classes, 
while the subjects in the control group were selected from students in 4:00 p.m. 
ESL classes, there might be differences in the groups based on class time prefer- 
ence that have nothing to do with the treatment involved. 

Experimental moiiality refers to tlie influence of subjects dropping out of one 
or more of the groups in a study. For example, in Table 1, some subjects in tlie 
conti'ol might be present for the pretest but absent for tlie posttest. These absences 
might cause differences in tlie results tliat had notliing to do witli tlie treatment. 

Selection-maturation interaction describes the effect of the maturation and 
selection bias variables (defined above) acting together. 

External Validity 

T’he four threats to external validity listed above can affect the generalizability 
of the results. 
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Reactive effects of testing and treatment describe tlie influence of taking a 
pretest on the sensitivity of the subjects to tlie treatment. For instance, if the 
treatment involved tlie use of cloze tests to practice reading prediction and the 
pretest was also a cloze test, the pretest might affect the subjects* sensitivity to 
the treatment. In other words, the generalizability of the results might be in 
question because the results depend on tlie use of a particular test. 

Interaction of selection biases and the treatment. If die re is some relationship 
between die group from which the subjects were selected and die effects of the 
treatment, interactions are said to exist between selection biases and the treat- 
ment variable. In other words, in any study, there is the possibility that any 
effects that are found are only taie for die population from wliich the groups 
were selected. It is also possible that the characterisdcs of that particular popu- 
lation may cause the treatment to be effecdve where it would not be in another 
population. In such a situation the selecdon bias would be interacdng widi the 
treatment and thus affecting the generalizability of the results. 

Reactive effects of experimental annngements. This refers to die impact of die 
fact diat die treatment was applied under experimental condidons rather dian real 
world conditions. For example, some pedagogical tecliniques might appear to 
work veiy well as a tieatment under classroom condidons but have no corre- 
spondingly beneficial effect on die students’ use of die language in the real world. 
The generalizability of die results to die real world would be in quesdon. 

Multiple treatment interference refers to die effects of applying more dian one 
treatment to die same subjects. Under these condidons, die effects of one treat- 
ment cannot be disentangled from the effects of odiers, and dius die results cannot 
be generalized to situadons diat do not contain die muldple treatments. 

Multiple Threats to Validity 

Unfortunately, threats to validity in research are seldom as simple as Table 1 
would suggest. This is because there may be numerous threats to validity oper- 
ating at the same time. Since the overall confusion caused by simultaneously 
having multiple direats to the validity of a study may well be synergistic, it is 
caicial that researchers guard against and control any and all of these problem 
factors, preferably while planning a study. (For more information on factors that 
threaten the validity of a study and how to control diem, see Brown, 1988; 
Campbell & Stanley, 1963; Hatch & Lazaraton, 1991; Tuckman, 1978.) 



Ethics 

Ethics in social science research have been considered from a number of per- 
spectives. For an ovei*view of this work see Kimmel (1988). Over the years, 
various organizations associated with social sciences research have provided 
guidelines for their memberships. For example, die American Psychological 
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Association has provided various kinds of guidelines for the etliical conduct of 
research (American Psychological Association, 1953, 1981, 1982, & 1985). Ac- 
cording to Kimmel (1988), ethical problems in social sciences research may 
have a number of tlie following characteristics: 

1. The complexity of a single research problem can give rise to multiple ques- 
tions of proper behavior. 

2. Sensitivity to etliical issues is necessary but not sufficient for solving them. 

3. Etliical problems are the results of conflicting values. 

4. Ethical problems can relate to both the subject matter of the research and the 
conduct of die research. 

5. An adequate understanding of an ediical problem sometimes requires a broad 
perspective based on die consequences of research. 

6. Ediical problems involve both personal and professional elements. 

7. Ethical problems can pertain to science (as a body of knowledge) and to 
research (conducted in such a way as to protect the rights of society and 
research participants. 

8. Judgments about proper conduct lie on a continuum ranging from die clearly 
unethical to die clearly ethical. 

9. An etliical problem can be encountered as a result of a decision to conduct 
a particular study or a decision not to conduct the study. 

Commandments 

In language related research, some of the most important ethical and profes- 
sional issues might best be summed up by ten straightforward commandments 
(adapted from Brown 1984). These commandments cover the researcher’s ethical 
and professional responsibilities with regard to the participants, analyses, and 
audience of a study: 

Participants 

I. Tliou shalt not abuse thy subjects in any manner including abuses of their 
persons, time, or effort, and diou shalt obtain diy subjects’ informed con- 
sent if required by diy institution. 

II. Thou shalt not abuse thy colleagues by collecting data from dieir students 
witliout permission, or by using too much precious class time. 

III. Thou shalt reward tliy subjects’ and colleagues’ efforts at least by giving 
tliem feedback or information on what happened in the study. 

Analyses 

Tliou shalt guard against consciously or subconsciously modifying thy 
data so that the results support thy views and prejudices. 

Thou shalt select the appropriate statistical tests. 

Thou shalt check the assumptions that underlie all statistical tests. 



IV. 

V. 

VI. 
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Audience 

VII. Thou shall explain thy research clearly so tliat it can be understood by 
thy readers. 

VIII. Thou shalt organize thy report using conventional sections, headings, 
and otlier conventions (see American Psychological Association, 1994) so 
dial diy readers can easily follow diy study. 

IX. Thou shalt interpret thy results carefully guarding against the temptation 
to over-interpret, or generalize beyond diat which thy results warrant. 

Above All Else 

X. Thou shalt continue to learn, read, and grow as a researcher so that thou 
can better seive thy field. 

Stating die ediical issues in the form of commandments may at first seem to be 
intended as tongue-in-cheek humor, but diese are not to be taken lighdy. Indeed, 
the endre enteiprise of research in language studies hinges on cooperation be- 
tween subjects, colleagues, researchers, and readers.’ Researchers should avoid 
contiibuting to die already abundant negative feelings about statistical research. 



Conclusion 

This paper began with a discussion of the issues involved in sampling a group, 
or groups, of subjects to be used in a study. Then, data collection instruments 
were examined in terms of the four scales of measurement that can be used. 
Next, a number of research designs were explored. Then, the factors which may 
jeopardize the internal and external validity of language studies were surveyed. 
Finally, the ethical issues involved in collecting data, conducting research, and 
reporting die results were covered. In short, a great many crucial issues have 
been covered in this paper — issues that must be thought tlirough before con- 
ducting a study. A little effort spent in die planning stages of a study can save 
die enormous amount of energy necessary to recover if things begin to come 
unraveled after die study has begun. 
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